---
title: Reranker-Enhanced Search
description: Boost relevance by reordering vector hits with reranking models.
icon: "ranking-star"
---

Reranker-enhanced search adds a second scoring pass after vector retrieval so Mem0 can return the most relevant memories first. Enable it when keyword similarity alone misses nuance or when you need the highest-confidence context for an agent decision.

<Info>
  **You’ll use this when…**
  - Queries are nuanced and require semantic understanding beyond vector distance.
  - Large memory collections produce too many near matches to review manually.
  - You want consistent scoring across providers by delegating ranking to a dedicated model.
</Info>

<Warning>
  Reranking raises latency and, for hosted models, API spend. Benchmark with production traffic and define a fallback path for latency-sensitive requests.
</Warning>

<Note>
  All configuration snippets translate directly to the TypeScript SDK—swap dictionaries for objects while keeping the same keys (`provider`, `config`, `rerank` flags).
</Note>

---

## Feature anatomy

- **Initial vector search:** Retrieve candidate memories by similarity.
- **Reranker pass:** A specialized model scores each candidate against the original query.
- **Reordered results:** Mem0 sorts responses using the reranker’s scores before returning them.
- **Optional fallbacks:** Toggle reranking per request or disable it entirely if performance or cost becomes a concern.

<AccordionGroup>
  <Accordion title="Supported providers">
    - **[Cohere](/components/rerankers/models/cohere)** – Multilingual hosted reranker with API-based scoring.  
    - **[Sentence Transformer](/components/rerankers/models/sentence_transformer)** – Local Hugging Face cross-encoders for GPU or CPU.  
    - **[Hugging Face](/components/rerankers/models/huggingface)** – Bring any hosted or on-prem reranker model ID.  
    - **[LLM Reranker](/components/rerankers/models/llm_reranker)** – Use your preferred LLM (OpenAI, etc.) for prompt-driven scoring.  
    - **[Zero Entropy](/components/rerankers/models/zero_entropy)** – High-quality neural reranking tuned for retrieval tasks.
  </Accordion>
  <Accordion title="Provider comparison">
    | Provider | Latency | Quality | Cost | Local deploy |
    | --- | --- | --- | --- | --- |
    | Cohere | Medium | High | API cost | ❌ |
    | Sentence Transformer | Low | Good | Free | ✅ |
    | Hugging Face | Low–Medium | Variable | Free | ✅ |
    | LLM Reranker | High | Very high | API cost | Depends |
  </Accordion>
</AccordionGroup>

---

## Configure it

### Basic setup

```python
from mem0 import Memory

config = {
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key"
        }
    }
}

m = Memory.from_config(config)
```

<Info icon="check">
  Confirm `results["results"][0]["score"]` reflects the reranker output—if the field is missing, the reranker was not applied.
</Info>

<Tip>
  Set `top_k` to the smallest candidate pool that still captures relevant hits. Smaller pools keep reranking costs down.
</Tip>

### Provider-specific options

```python
# Cohere reranker
config = {
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key",
            "top_k": 10,
            "return_documents": True
        }
    }
}

# Sentence Transformer reranker
config = {
    "reranker": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cuda",
            "max_length": 512
        }
    }
}

# Hugging Face reranker
config = {
    "reranker": {
        "provider": "huggingface",
        "config": {
            "model": "BAAI/bge-reranker-base",
            "device": "cuda",
            "batch_size": 32
        }
    }
}

# LLM-based reranker
config = {
    "reranker": {
        "provider": "llm_reranker",
        "config": {
            "llm": {
                "provider": "openai",
                "config": {
                    "model": "gpt-4",
                    "api_key": "your-openai-api-key"
                }
            },
            "top_k": 5
        }
    }
}
```

<Note>
  Keep authentication keys in environment variables when you plug these configs into production projects.
</Note>

### Full stack example

```python
config = {
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "host": "localhost",
            "port": 6333
        }
    },
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4",
            "api_key": "your-openai-api-key"
        }
    },
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small",
            "api_key": "your-openai-api-key"
        }
    },
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key",
            "top_k": 15,
            "return_documents": True
        }
    }
}

m = Memory.from_config(config)
```

<Info icon="check">
  A quick search should now return results with both vector and reranker scores, letting you compare improvements immediately.
</Info>

### Async support

```python
from mem0 import AsyncMemory

async_memory = AsyncMemory.from_config(config)

async def search_with_rerank():
    return await async_memory.search(
        "What are my preferences?",
        user_id="alice",
        rerank=True
    )

import asyncio
results = asyncio.run(search_with_rerank())
```

<Info icon="check">
  Inspect the async response to confirm reranking still applies; the scores should match the synchronous implementation.
</Info>

### Tune performance and cost

```python
# GPU-friendly local reranker configuration
config = {
    "reranker": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cuda",
            "batch_size": 32,
            "top_k": 10,
            "max_length": 256
        }
    }
}

# Smart toggle for hosted rerankers
def smart_search(query, user_id, use_rerank=None):
    if use_rerank is None:
        use_rerank = len(query.split()) > 3
    return m.search(query, user_id=user_id, rerank=use_rerank)
```

<Tip>
  Use heuristics (query length, user tier) to decide when to rerank so high-signal queries benefit without taxing every request.
</Tip>

### Handle failures gracefully

```python
try:
    results = m.search("test query", user_id="alice", rerank=True)
except Exception as exc:
    print(f"Reranking failed: {exc}")
    results = m.search("test query", user_id="alice", rerank=False)
```

<Warning>
  Always fall back to vector-only search—dropped queries introduce bigger accuracy issues than slightly less relevant ordering.
</Warning>

### Migrate from v0.x

```python
# Before: basic vector search
results = m.search("query", user_id="alice")

# After: same API with reranking enabled via config
config = {
    "reranker": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2"
        }
    }
}

m = Memory.from_config(config)
results = m.search("query", user_id="alice")
```

---

## See it in action

### Basic reranked search

```python
results = m.search(
    "What are my food preferences?",
    user_id="alice"
)

for result in results["results"]:
    print(f"Memory: {result['memory']}")
    print(f"Score: {result['score']}")
```

<Info icon="check">
  Expect each result to list the reranker-adjusted score so you can compare ordering against baseline vector results.
</Info>

### Toggle reranking per request

```python
results_with_rerank = m.search(
    "What movies do I like?",
    user_id="alice",
    rerank=True
)

results_without_rerank = m.search(
    "What movies do I like?",
    user_id="alice",
    rerank=False
)
```

<Tip>
  Log the reranked vs. non-reranked lists during rollout so stakeholders can see the improvement before enforcing it everywhere.
</Tip>

<Info icon="check">
  You should see the same memories in both lists, but the reranked response will reorder them based on semantic relevance.
</Info>

### Combine with metadata filters

```python
results = m.search(
    "important work tasks",
    user_id="alice",
    filters={
        "AND": [
            {"category": "work"},
            {"priority": {"gte": 7}}
        ]
    },
    rerank=True,
    limit=20
)
```

<Info icon="check">
  Verify filtered reranked searches still respect every metadata clause—reranking only reorders candidates, it never bypasses filters.
</Info>

### Real-world playbooks

#### Customer support

```python
config = {
    "reranker": {
        "provider": "cohere",
        "config": {
            "model": "rerank-english-v3.0",
            "api_key": "your-cohere-api-key"
        }
    }
}

m = Memory.from_config(config)

results = m.search(
    "customer having login issues with mobile app",
    agent_id="support_bot",
    filters={"category": "technical_support"},
    rerank=True
)
```

<Info icon="check">
  Top results should highlight tickets matching the login issue context so agents can respond faster.
</Info>

#### Content recommendation

```python
results = m.search(
    "science fiction books with space exploration themes",
    user_id="reader123",
    filters={"content_type": "book_recommendation"},
    rerank=True,
    limit=10
)

for result in results["results"]:
    print(f"Recommendation: {result['memory']}")
    print(f"Relevance: {result['score']:.3f}")
```

<Info icon="check">
  Expect high-scoring recommendations that match both the requested theme and any metadata limits you applied.
</Info>

#### Personal assistant

```python
results = m.search(
    "What restaurants did I enjoy last month that had good vegetarian options?",
    user_id="foodie_user",
    filters={
        "AND": [
            {"category": "dining"},
            {"rating": {"gte": 4}},
            {"date": {"gte": "2024-01-01"}}
        ]
    },
    rerank=True
)
```

<Tip>
  Reuse this pattern for other lifestyle queries—swap the filters and prompt text without changing the rerank configuration.
</Tip>

<Note>
  Each workflow keeps the same `m.search(...)` signature, so you can template these queries across agents with only the prompt and filters changing.
</Note>

---

## Verify the feature is working

- Inspect result payloads for both `score` (vector) and reranker scores; mismatched fields indicate the reranker didn’t execute.
- Track latency before and after enabling reranking to ensure SLAs hold.
- Review provider logs or dashboards for throttling or quota warnings.
- Run A/B comparisons (rerank on/off) to validate improved relevance before defaulting to reranked responses.

---

## Best practices

1. **Start local:** Try Sentence Transformer models to prove value before paying for hosted APIs.
2. **Monitor latency:** Add metrics around reranker duration so you notice regressions quickly.
3. **Control spend:** Use `top_k` and selective toggles to cap hosted reranker costs.
4. **Keep a fallback:** Always catch reranker failures and continue with vector-only ordering.
5. **Experiment often:** Swap providers or models to find the best fit for your domain and language mix.

---

<CardGroup cols={2}>
  <Card title="Configure Rerankers" icon="sliders" href="/components/rerankers/config">
    Review provider fields, defaults, and environment variables before going live.
  </Card>
  <Card title="Build a Custom LLM Reranker" icon="sparkles" href="/components/rerankers/models/llm_reranker">
    Extend scoring with prompt-tuned LLM rerankers for niche workflows.
  </Card>
</CardGroup>
